Universal Kriging (UK)#

Universal Kriging (UK) is a variant of the Ordinary Kriging under non-stationary condition where mean differ in a deterministic way in different locations (local trend or drift), while only the variance is constant. This second-order stationarity (“weak stationarity”) is often a pertinent assumption with environmental exposures. In UK, usually first trend is calculated as a function of the coordinates and then the variation in what is left over (the residuals) as a random field is added to trend for making final prediction.

\[\begin{split}\begin{aligned} Z\left(s_{i}\right) &=m\left(s_{i}\right)+e\left(s_{i}\right) \\ Z(\vec{x}) &=\sum_{k=0}^{K} \beta_{k} f_{k}(\vec{x})+\varepsilon(\vec{x}) \end{aligned}\end{split}\]
  • Where the \(f_{k}\) are some global functions of position \(\vec{x}\) and the \(\beta_{k}\) are the coefficients.

  • The \(f\) are called base functions. The \(\varepsilon(\vec{x})\) is the spatially-correlated error, which is modelled as before, with a variogram, but now only considering the residuals, after the global trend is removed.

Note

The definition above came from a geospatial data science course created by Prof. Zia Ahmed at The State of New York University at Buffalo.

  • Thanks Prof. Zia Ahmed for the great resource!

Load python modules

[1]:
import context
import salem
import numpy as np
import pandas as pd
import xarray as xr
import geopandas as gpd
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
from pykrige.uk import UniversalKriging

import plotly.express as px
from datetime import datetime

from utils.utils import pixel2poly, plotvariogram, cfcompliant
from context import data_dir
******************************
context imported. Front of path:
/Users/rodell/krige-smoke
/Users/rodell/krige-smoke/docs/source
******************************

through /Users/rodell/krige-smoke/docs/source/context.py -- pha

Load Data#

Open the reformated data with the linear, meter-based Lambert projection (EPSG:3347). - Again this is helpful as lat/lon coordinates are less suitable for measuring distances which is important for spatial interpolation.

[2]:
df = pd.read_csv(str(data_dir) + "/obs/gpm25.csv")
gpm25 = gpd.GeoDataFrame(
    df,
    crs="EPSG:4326",
    geometry=gpd.points_from_xy(df["lon"], df["lat"]),
).to_crs("EPSG:3347")
gpm25["Easting"], gpm25["Northing"] = gpm25.geometry.x, gpm25.geometry.y
gpm25.head()
[2]:
Unnamed: 0 id datetime lat lon PM2.5 geometry Easting Northing
0 2 42073 2021-07-16 22:00:00 47.185173 -122.176855 0.862 POINT (3972238.901 1767531.888) 3.972239e+06 1.767532e+06
1 3 53069 2021-07-16 22:00:00 47.190197 -122.177992 1.078 POINT (3972419.863 1768071.699) 3.972420e+06 1.768072e+06
2 12 10808 2021-07-16 22:00:00 40.507316 -111.899188 9.780 POINT (4460286.051 743178.640) 4.460286e+06 7.431786e+05
3 16 85391 2021-07-16 22:00:00 48.454213 -123.283643 0.874 POINT (3964698.001 1931774.531) 3.964698e+06 1.931775e+06
4 21 79095 2021-07-16 22:00:00 47.672130 -122.514183 0.784 POINT (3974631.739 1827689.201) 3.974632e+06 1.827689e+06

Create Grid#

Again we will create a grid that we want to use for the interpolation.

  • The grid in the fromate of a dataset is helpful for reprojecting our covariates to match the interpolated grid.

[3]:
## define the desired  grid resolution in meters
resolution = 20_000  # grid cell size in meters

## make grid based on dataset bounds and resolution
gridx = np.arange(
    gpm25.bounds.minx.min() - resolution,
    gpm25.bounds.maxx.max() + resolution,
    resolution,
)
gridy = np.arange(
    gpm25.bounds.miny.min() - resolution,
    gpm25.bounds.maxy.max() + resolution,
    resolution,
)

## use salem to create a dataset with the grid.
krig_ds = salem.Grid(
    nxny=(len(gridx), len(gridy)),
    dxdy=(resolution, resolution),
    x0y0=(gpm25.bounds.minx.min(), gpm25.bounds.miny.min()),
    proj="epsg:3347",
    pixel_ref="corner",
).to_dataset()
## print dataset
krig_ds
[3]:
<xarray.Dataset>
Dimensions:  (x: 147, y: 124)
Coordinates:
  * x        (x) float64 3.46e+06 3.48e+06 3.5e+06 ... 6.36e+06 6.38e+06
  * y        (y) float64 4.984e+05 5.184e+05 5.384e+05 ... 2.938e+06 2.958e+06
Data variables:
    *empty*
Attributes:
    pyproj_srs:  +proj=lcc +lat_0=63.390675 +lon_0=-91.8666666666667 +lat_1=4...

Covariate#

We will use the Bluesky Canada Smoke Forecast (BlueSky) as a covariate for universal kriging with specified drift. The data comes from firesmoke.ca

[4]:

ds = salem.open_xr_dataset(str(data_dir) + f"/dispersion1.nc")

Set up specified drift#

For specified we need the modeled derived PM2.5 concentration from BlueSky at every aq monitor location and BleuSky modeled PM2.5 concentration on the same grid we are interpolating.

BlueSky modeled PM2.5 concentration at AQs location#

[5]:

y = xr.DataArray( np.array(df["lat"]), dims="ids", coords=dict(ids=df.id.values), ) x = xr.DataArray( np.array(df["lon"]), dims="ids", coords=dict(ids=df.id.values), ) var_points = ds["pm25"].interp(x=x, y=y, method="linear") # print(var_points) if len(df.index) == len(var_points.values): var_points = var_points.values else: raise ValueError("Lengths dont match")

BlueSky PM2.5 Data on grid#

Now we will transform the BlueSky PM2.5 data to be on the grid we are interpolating too. This is feed in as a specified drift array when executing the interpolation.

[6]:
ds_T = krig_ds.salem.transform(ds)
var_array = ds_T["pm25"].values

Plot BlueSky PM2.5#

[7]:
ax = plt.axes(projection=ccrs.Orthographic(-80, 35))
ax.set_global()
ds["pm25"].plot(
    ax=ax,
    transform=ccrs.PlateCarree(),
    levels=[0, 5, 10, 20, 40, 80, 160, 300, 600],
    cmap="Reds",
)
ax.coastlines()
ax.set_extent([-132, -85, 35, 65], crs=ccrs.PlateCarree())
_images/comps-uk-bsp_15_0.png

Setup UK#

[8]:
nlags = 15
variogram_model = "spherical"


startTime = datetime.now()
krig = UniversalKriging(
    x=gpm25["Easting"],  ## x location of aq monitors in lambert conformal
    y=gpm25["Northing"],  ## y location of aq monitors in lambert conformal
    z=gpm25["PM2.5"],  ## measured PM 2.5 concentrations at locations
    drift_terms=["specified"],
    variogram_model=variogram_model,
    nlags=nlags,
    specified_drift=[var_points],  ## BlueSky PM2.5 at aq monitors
)
print(f"UK build time {datetime.now() - startTime}")
UK build time 0:03:30.516893

PyKrige will optimize most parameters based on user defined empirical model and the number of bins.

  • I tested several empirical models and bin sizes and found (for this case study) that a spherical model with 15 bins was optimal based on the output statics.

  • The literature supports spherical for geospatial interpolation applications over other methods.

[9]:
plotvariogram(krig)
_images/comps-uk-bsp_19_0.png

Execute UK#

Interpolate data to our grid using UK with specified drift. Where the specified drift is the linear correlation of BlueSky PM2.5 to PM2.5 at all locations and on the interploated grid for kriging.

[10]:
var_array[var_array > np.max(var_points)] = np.max(var_points) + 20

startTime = datetime.now()
z, ss = krig.execute("grid", gridx, gridy, specified_drift_arrays=[var_array])
print(f"UK execution time {datetime.now() - startTime}")
UK_pm25 = np.where(z < 0, 0, z)

krig_ds["UK_pm25"] = (("y", "x"), UK_pm25)
UK execution time 0:00:07.096796

Plot UK#

Convert data to polygons to be plot-able on a slippy mapbox. This is not necessary but but :)

[11]:
polygons, values = pixel2poly(gridx, gridy, UK_pm25, resolution)
pm25_model = gpd.GeoDataFrame(
    {"Modelled PM2.5": values}, geometry=polygons, crs="EPSG:3347"
).to_crs("EPSG:4326")

fig = px.choropleth_mapbox(
    pm25_model,
    geojson=pm25_model.geometry,
    locations=pm25_model.index,
    color="Modelled PM2.5",
    color_continuous_scale="jet",
    center={"lat": 50.0, "lon": -110.0},
    zoom=2.5,
    mapbox_style="carto-positron",
    opacity=0.6,
)
fig.update_layout(margin=dict(l=0, r=0, t=30, b=10))
fig.update_traces(marker_line_width=0)